Skip to content

Conversation

@cnettel
Copy link

@cnettel cnettel commented May 31, 2025

This fix resolves retrieving the constant from a load instruction when the large address LLA mode is used. Depending on build settings, the constant pool can be resolved already here, or later during linking. If llc is used on a short snippet of IR, the former tends to be the case, while in a full project, it can be the latter. The extracted value is used for computing known bits in instructions then using those constants as operands.

The LLA instruction has a single operand, which here is the actual constant pool reference that we want to unpack.

The lambda helper GetSupportedConstantPool checks that the SDValue passed in is ConstantPoolSDNode. But we've just checked that the opcode is RISCVISD::LLA. Thus, for the LLA case, this retrieval always silently fails. Operand 0 of the instruction, on the other hand, is frequently a ConstantPoolSDNode, which can be verified by dumping the DAG.

This is also in line with the later case in the method, for RISCVISD::ADD_LO, although that is clouded a bit by the glue between LO and HI. For both subinstructions, the actual constant pool SD node is found in an operand.

In short, the wrong object was checked and the retrieval always failed, gracefully. By referring to the right object, proper known bits support etc is enabled. If the operand is somehow incompatible, the existing checks in GetSupportedConstantPool should kick in and thus be safe.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Carl Nettelblad (cnettel)

Changes

This fix resolves retrieving the constant from a load instruction when the large address LLA mode is used. Depending on build settings, the constant pool can be resolved already here, or later during linking. If llc is used on a short snippet of IR, the former tends to be the case, while in a full project, it can be the latter. The extracted value is used for computing known bits in instructions then using those constants as operands.

The LLA instruction has a single operand, which here is the actual constant pool reference that we want to unpack.

The lambda helper GetSupportedConstantPool checks that the SDValue passed in is ConstantPoolSDNode. But we've just checked that the opcode is RISCVISD::LLA. Thus, for the LLA case, this retrieval always silently fails. Operand 0 of the instruction, on the other hand, is frequently a ConstantPoolSDNode, which can be verified by dumping the DAG.

This is also in line with the later case in the method, for RISCVISD::ADD_LO, although that is clouded a bit by the glue between LO and HI. For both subinstructions, the actual constant pool SD node is found in an operand.

In short, the wrong object was checked and the retrieval always failed, gracefully. By referring to the right object, proper known bits support etc is enabled. If the operand is somehow incompatible, the existing checks in GetSupportedConstantPool should kick in and thus be safe.


Full diff: https://github.com/llvm/llvm-project/pull/142292.diff

1 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+1-1)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 6e8e4ac1c6a95..9d337a5a20f05 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -20813,7 +20813,7 @@ RISCVTargetLowering::getTargetConstantFromLoad(LoadSDNode *Ld) const {
 
   // Simple case, LLA.
   if (Ptr.getOpcode() == RISCVISD::LLA) {
-    auto *CNode = GetSupportedConstantPool(Ptr);
+    auto *CNode = GetSupportedConstantPool(Ptr.getOperand(0));
     if (!CNode || CNode->getTargetFlags() != 0)
       return nullptr;
 

@jrtc27
Copy link
Collaborator

jrtc27 commented May 31, 2025

Do you have a test case to illustrate the bug you're trying to fix?

@cnettel
Copy link
Author

cnettel commented May 31, 2025

Do you have a test case to illustrate the bug you're trying to fix?

I apologize for not providing one straight away. I only get the behavior when LLA was used for constant pools in a larger project (involving LLVMCPU lowering with IREE), with a quite complex build environment. In fact, I stumbled on this from the very fact that for small test programs, I saw much better optimization than on the full thing, despite the optimization flags being identical.

This one-line fix brings those two cases in line with each other, but it is not a minimal reproduction. I'm not familiar enough with the various memory, linkage, and relocations models of RISC-V to know when LLA will be triggered for constant pools. In any such scenario where a late stage KnownBits optimization is possible, the behavior should be triggered.

The existing code is buggy in the sense that the return value from GetSupportedConstantPool will always return nullptr in the line affected by the commit. As things stand, Ptr is checked to be a ConstantPoolSDNode, but we only enter that block if Ptr has been established to be a RISCVISD::LLA opcode.

@mshockwave
Copy link
Member

Do you have a test case to illustrate the bug you're trying to fix?

I apologize for not providing one straight away. I only get the behavior when LLA was used for constant pools in a larger project (involving LLVMCPU lowering with IREE), with a quite complex build environment. In fact, I stumbled on this from the very fact that for small test programs, I saw much better optimization than on the full thing, despite the optimization flags being identical.

This one-line fix brings those two cases in line with each other, but it is not a minimal reproduction. I'm not familiar enough with the various memory, linkage, and relocations models of RISC-V to know when LLA will be triggered for constant pools. In any such scenario where a late stage KnownBits optimization is possible, the behavior should be triggered.

The existing code is buggy in the sense that the return value from GetSupportedConstantPool will always return nullptr in the line affected by the commit. As things stand, Ptr is checked to be a ConstantPoolSDNode, but we only enter that block if Ptr has been established to be a RISCVISD::LLA opcode.

You can dump the LLVM IR generated by IREE with --iree-hal-dump-executable-intermediates-to. If you can reproduce the error with llc and those LLVM IR, perhaps you can use llvm-reduce to further shrink the reproducer code.

@cnettel
Copy link
Author

cnettel commented Jun 2, 2025

Do you have a test case to illustrate the bug you're trying to fix?

I apologize for not providing one straight away. I only get the behavior when LLA was used for constant pools in a larger project (involving LLVMCPU lowering with IREE), with a quite complex build environment. In fact, I stumbled on this from the very fact that for small test programs, I saw much better optimization than on the full thing, despite the optimization flags being identical.
This one-line fix brings those two cases in line with each other, but it is not a minimal reproduction. I'm not familiar enough with the various memory, linkage, and relocations models of RISC-V to know when LLA will be triggered for constant pools. In any such scenario where a late stage KnownBits optimization is possible, the behavior should be triggered.
The existing code is buggy in the sense that the return value from GetSupportedConstantPool will always return nullptr in the line affected by the commit. As things stand, Ptr is checked to be a ConstantPoolSDNode, but we only enter that block if Ptr has been established to be a RISCVISD::LLA opcode.

You can dump the LLVM IR generated by IREE with --iree-hal-dump-executable-intermediates-to. If you can reproduce the error with llc and those LLVM IR, perhaps you can use llvm-reduce to further shrink the reproducer code.

That's the thing. I was already using llc with intermediates when I discovered this, since I wanted to understand why I got strange results for optimizations involving some constants.

I haven't found any settings in llc choosing this codepath, but it's consistently used for the build generating the .ll intermediates in the .o and .s intermediates found in the same directory. In the corresponding cases built within IREE, KnownBits for constants loaded for constant pools were evaluated to all ?.

It's not only a matter of the LLVM IR stream itself, but some other build setting passed on by IREE which I haven't identified. Still, the code generated is valid, it's just that the constant pool strategy used seems to be different. My hypothesis is that my llc build ends up in the codepath using a RISCVISD::ADD_LO/RISCVISD::HI pair, while the IREE build is still abstracting this to RISCVISD::LLA, hitting the bug. If someone has some suggestions on flags to llc that can trigger one or the other, I'll be happy to look into it more.

Again, the actual bug is that the current code assumes the RISCVISD::LLA node itself to also be a ConstantPoolSDNode, which is impossible. The ConstantPoolSDNode is instead an operand to that node. The current code will never provide a non-null result if the constant pool reference is done using RISCVISD::LLA.

@topperc
Copy link
Collaborator

topperc commented Jun 2, 2025

Do you have a test case to illustrate the bug you're trying to fix?

I apologize for not providing one straight away. I only get the behavior when LLA was used for constant pools in a larger project (involving LLVMCPU lowering with IREE), with a quite complex build environment. In fact, I stumbled on this from the very fact that for small test programs, I saw much better optimization than on the full thing, despite the optimization flags being identical.
This one-line fix brings those two cases in line with each other, but it is not a minimal reproduction. I'm not familiar enough with the various memory, linkage, and relocations models of RISC-V to know when LLA will be triggered for constant pools. In any such scenario where a late stage KnownBits optimization is possible, the behavior should be triggered.
The existing code is buggy in the sense that the return value from GetSupportedConstantPool will always return nullptr in the line affected by the commit. As things stand, Ptr is checked to be a ConstantPoolSDNode, but we only enter that block if Ptr has been established to be a RISCVISD::LLA opcode.

You can dump the LLVM IR generated by IREE with --iree-hal-dump-executable-intermediates-to. If you can reproduce the error with llc and those LLVM IR, perhaps you can use llvm-reduce to further shrink the reproducer code.

That's the thing. I was already using llc with intermediates when I discovered this, since I wanted to understand why I got strange results for optimizations involving some constants.

I haven't found any settings in llc choosing this codepath, but it's consistently used for the build generating the .ll intermediates in the .o and .s intermediates found in the same directory. In the corresponding cases built within IREE, KnownBits for constants loaded for constant pools were evaluated to all ?.

It's not only a matter of the LLVM IR stream itself, but some other build setting passed on by IREE which I haven't identified. Still, the code generated is valid, it's just that the constant pool strategy used seems to be different. My hypothesis is that my llc build ends up in the codepath using a RISCVISD::ADD_LO/RISCVISD::HI pair, while the IREE build is still abstracting this to RISCVISD::LLA, hitting the bug. If someone has some suggestions on flags to llc that can trigger one or the other, I'll be happy to look into it more.

Again, the actual bug is that the current code assumes the RISCVISD::LLA node itself to also be a ConstantPoolSDNode, which is impossible. The ConstantPoolSDNode is instead an operand to that node. The current code will never provide a non-null result if the constant pool reference is done using RISCVISD::LLA.

HI/ADD_LO is used for the "small" codemodel. LLA is used for PIC and the medium code model. The relevant flags are -code-model=small vs -code-model=medium or -relocation-model=pic.

@cnettel
Copy link
Author

cnettel commented Jun 2, 2025

Again, the actual bug is that the current code assumes the RISCVISD::LLA node itself to also be a ConstantPoolSDNode, which is impossible. The ConstantPoolSDNode is instead an operand to that node. The current code will never provide a non-null result if the constant pool reference is done using RISCVISD::LLA.

HI/ADD_LO is used for the "small" codemodel. LLA is used for PIC and the medium code model. The relevant flags are -code-model=small vs -code-model=medium or -relocation-model=pic.

Thanks. I thought I tried pic, since that made sense from other platforms, but maybe I didn't check the results enough.

@cnettel
Copy link
Author

cnettel commented Jun 2, 2025

It took some work. This isn's quite minimal, but you need to fool LLVM a bit to store constants without inlining all the math, even at -O0.

The premise is that ori can be replaced by addi if the compiler can prove that the low bits are zero. This is done late during instruction selection.

The following IR:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-unknown"

; Function Attrs: mustprogress noinline nounwind optnone
define dso_local noundef i64 @_Z4sum2j(i32 noundef signext %c, i32 noundef signext %d) #0 {
entry:
  %or1 = or i64 -9191740941672644608, 4096
  %or2 = or i64 -9191740941672644608, 8192
  %or3 = or i64 -9191740941672644608, 16384
  %conv = zext i32 %c to i64
  %donv = zext i32 %d to i64
  %3 = mul i64 %or1, %conv
  %4 = mul i64 %or2, %donv
  %5 = mul i64 %or3, %conv
  %6 = add i64 %3, %4
  %7 = add i64 %6, %5
  %8 = or i64 %5, 127
  %9 = mul i64 %3, %8
  %add = add i64 -9191740941672644608, %9
  %add2 = add i64 %add, %conv
  ret i64 %add2
}

generates the following assembler if run through llc -mcpu=mips-p8700 -O0 -relocation-model=pic in the patched version:

        .attribute      4, 16
        .attribute      5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0_zaamo1p0_zalrsc1p0_zba1p0_zbb1p0"
        .file   "testbug.ll"
        .section        .rodata.cst8,"aM",@progbits,8
        .p2align        3, 0x0                          # -- Begin function _Z4sum2j
.LCPI0_0:
        .quad   -9191740941672628224            # 0x8070605040304000
.LCPI0_1:
        .quad   -9191740941672640512            # 0x8070605040301000
        .text
        .globl  _Z4sum2j
        .p2align        1
        .type   _Z4sum2j,@function
_Z4sum2j:                               # @_Z4sum2j
.L_Z4sum2j$local:
        .type   .L_Z4sum2j$local,@function
        .cfi_startproc
# %bb.0:                                # %entry
                                        # kill: def $x11 killed $x10
        zext.w  a1, a0
.Lpcrel_hi0:
        auipc   a2, %pcrel_hi(.LCPI0_0)
        addi    a2, a2, %pcrel_lo(.Lpcrel_hi0)
        ld      a2, 0(a2)
        mul     a2, a1, a2
        addi    a2, a2, 127
        mul     a1, a1, a2
.Lpcrel_hi1:
        auipc   a2, %pcrel_hi(.LCPI0_1)
        addi    a2, a2, %pcrel_lo(.Lpcrel_hi1)
        ld      a2, 0(a2)
        mul     a1, a1, a2
        add.uw  a0, a0, a1
        lui     a1, 1015920
        addiw   a1, a1, 1541
        slli    a1, a1, 16
        addi    a1, a1, 1027
        slli    a1, a1, 20
        add     a0, a0, a1
        ret
.Lfunc_end0:
        .size   _Z4sum2j, .Lfunc_end0-_Z4sum2j
        .size   .L_Z4sum2j$local, .Lfunc_end0-_Z4sum2j
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

This is correct. The same command, on the current main branch, generates the ori instruction instead, showing that or_is_add in RISCVInstrInfo isn't able to see the known bits:

        .attribute      4, 16
        .attribute      5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0_zaamo1p0_zalrsc1p0_zba1p0_zbb1p0"
        .file   "testbug.ll"
        .section        .rodata.cst8,"aM",@progbits,8
        .p2align        3, 0x0                          # -- Begin function _Z4sum2j
.LCPI0_0:
        .quad   -9191740941672628224            # 0x8070605040304000
.LCPI0_1:
        .quad   -9191740941672640512            # 0x8070605040301000
        .text
        .globl  _Z4sum2j
        .p2align        1
        .type   _Z4sum2j,@function
_Z4sum2j:                               # @_Z4sum2j
.L_Z4sum2j$local:
        .type   .L_Z4sum2j$local,@function
        .cfi_startproc
# %bb.0:                                # %entry
                                        # kill: def $x11 killed $x10
        zext.w  a1, a0
.Lpcrel_hi0:
        auipc   a2, %pcrel_hi(.LCPI0_0)
        addi    a2, a2, %pcrel_lo(.Lpcrel_hi0)
        ld      a2, 0(a2)
        mul     a2, a1, a2
        ori     a2, a2, 127
        mul     a1, a1, a2
.Lpcrel_hi1:
        auipc   a2, %pcrel_hi(.LCPI0_1)
        addi    a2, a2, %pcrel_lo(.Lpcrel_hi1)
        ld      a2, 0(a2)
        mul     a1, a1, a2
        add.uw  a0, a0, a1
        lui     a1, 1015920
        addiw   a1, a1, 1541
        slli    a1, a1, 16
        addi    a1, a1, 1027
        slli    a1, a1, 20
        add     a0, a0, a1
        ret
.Lfunc_end0:
        .size   _Z4sum2j, .Lfunc_end0-_Z4sum2j
        .size   .L_Z4sum2j$local, .Lfunc_end0-_Z4sum2j
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

@topperc
Copy link
Collaborator

topperc commented Jun 2, 2025

Please add a pre-commit to this PR showing the current code generation. Then amend your commit to show how this test is affected by this change.

topperc added a commit to topperc/llvm-project that referenced this pull request Jun 20, 2025
We need to pass the operand of LLA to GetSupportedConstantPool.

This replaces llvm#142292 with test from there added as a pre-commit
for both medlow and pic.
topperc added a commit that referenced this pull request Jun 21, 2025
…145112)

We need to pass the operand of LLA to GetSupportedConstantPool.
    
This replaces #142292 with test from there added as a pre-commit
for both medlow and pic.
topperc added a commit that referenced this pull request Jun 21, 2025
…tFromLoad. (#145112)"

With proper co-author.

Original message:

We need to pass the operand of LLA to GetSupportedConstantPool.

This replaces #142292 with test from there added as a pre-commit
for both medlow and pic.

Co-authored-by: Carl Nettelblad [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants